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DIRICHLET  PRIOR  INFORMATION* 
by 

Gregory  Campbell  and  Myles  Hollander 
Purdue  University  and  Florida  State  University 


1 . Introduction  and  Sunonary 

In  this  paper  we  treat  the  topic  of  incomplete  information  regarding  the 
parameter  a of  a Dirichlet  process  prior.  Ferguson  [4]  introduced  the  Dirichlet 
process  for  the  incorporation  of  prior  information  into  the  analysis  of 
nonparametric  problems.  The  process  can  be  viewed  as  a prior  on  the  set  of 
all  distributions  on  a measureable  space  fX,A).  The  process  is  parametrized 
by  a,  a non-negative,  non-null  finite  measure  on  (X,A).  (In  this  paper  we 
restrict  to  situations  where  X = (?,  the  real  line,  and  A = B,  the  Borel 
o-field.)  Typically,  to  use  estimators  which  are  Bayes  with  respect  to  a 
Dirichlet  process  with  parameter  a,  the  statistician  must  provide  a coiqilete 
specification  of  the  measure  a.  This  paper  develops  some  estimators  that  rely 
only  on  partial  information  concerning  a. 

One  approach  to  incomplete  information  concerning  a is  that  initiated 
by  Doksum  [3].  Doksum  assumes  that  o(tj^,tj^j],  i*l,...,k-l  are  known  with 
a(R-(tj,tj^l)  « 0.  That  is,  the  values  that  o assigns  to  the  k-1  intervals 
' ■ * * ’ ^^k-l*^k^  are  known,  and  o(R)  = o(tj,t|^].  In  Erection  3 of  this 
paper,  Ooksum's  technique  for  obtaining  a mixed  rule  (Definition  3.1}  is 
considered  and  shown  also  to  yield  a G-minimax  rule  (Definition  3.2)  for  a 
suitable  choice  of  G. 

•Research  sponsored  by  the  Air  Force  Office  of  Scientific  Research,  AFSC,  USAF, 
under  Grants  AFOSR-74-2581B  and  AFOSR-76-3109.  The  United  States  Government 
is  authorized  to  reproduce  and  distribute  reprints  for  governmental  purposes. 
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Section  4 considers  the  estimation  of  A = Pr{XsY},  when  X, is  a 

1 m 

sample  from  a Dirichlet  process  with  parameter  o and  is  a sample 

from  a second,  independent  Dirichlet  process  with  parameter  0.  A mixed 

rule  is  found  to  be 
k-1 

I {[a(tj.t^]  ♦ + ia(t^,t^^j]  + }{ 0(t^,t^^j]  + N^> 

\ “ (oCO  + m)(0(R)  + n)  ‘ 


(1.1) 


where  and  denote  the  nvmber  of  X's  and  Y's,  respectively,  in  the  interval 

In  Section  5 the  problem  considered  is  the  estimation  of  the  rank  order 

(Definition  5.1)  of  Xj  among  Xj,...,X^  based  on  X^,...,X^  (r<n) , where 

Xj,..",X^  is  a sample  of  size  n from  a Dirichlet  process  on  (R,B)  with  parameter 

a.  For  the  case  where  a is  completely  specified,  a Bayes  estimator  was 

developed  by  Campbell  and  Hollander  [2],  Here  a mixed  rule  is  obtained  for 

the  case  where  a is  not  completely  known  but  instead  only  the  a(t.,t.  .] 

k-1  1 1 i 

values,  i*l,...,k-l  (with  a(R)  = ^ «(t^,t^^j^]) , are  specified. 

i=l 

Section  2 contains  some  Dirichlet  process  preliminaries. 


2.  Dirichlet  Process  Preliminaries 

Let  G(a,6)  denote  the  gamma  distribution  with  shape  parameter  a ^ 0 and 
scale  parameter  > 0.  If  a = r':e  distribi;  :ion  is  degenerate  at  0.  If 


Cl  > 0,  it  has  a density  vi.th  rorp''ct  ta  Lebes'-ue  measure  on  the  real  line  given  by 


f(z|a,6)  = (r(a)e''r^  exp(-z/6)  I^o^^j(z), 


(2.1) 


where  !.(•)  denotes  the  indicator  function  of  the  set 
A 


/ . 
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Definition  2.1.  The  Dirichlet  distribution  with  parameter  (a, , . . . ,a. ) where 

k ^ 

a.  ^ 0 for  all  j and  a.  > 0,  denoted  P(a  is  defined  as  the 

J j*l  J Ik 

distribution  of  (Y^  , . . . ,Yj^) , where 

k 

Y = Z / Iz  j = l....,k, 

^ ^1=1 

and  the  Z^^'s  are  independent  random  variables  with  gamma  distributions 
G(a^,  1),  for  i = l,.,.,k. 

If  a^  > 0 for  all  j = l,...,k,  the  (k  - 1) -dimensional  distribution  of 
(Yj , . . . ,Yj^_ j)  is  absolutely  continuous  with  respect  to  Lebesgue  measure  on 
the  (k  - 1) -dimensional  Euclidean  space  with  density 


r(Oj+. . ,+Oj^) 


where  S is  the  simplex 


k-l  a.-l 

k-l 

■Vi‘ 

1 - .1  Xi 

i=l 

“k-1 


(2.2) 


^ “ ( (Xi » • • • ) • Xj 


k-l 


> 0,  i = 1... .,k-l,  I y.  ^ U. 

i=l  1 


For  k = 2,  (2.2)  becomes  the  density  of  a beta  distribution  with  parameters 


and  02. 


Proposition  2.2.  (Wilks,  [6]  p.  179).  The  r ^ , . . . ,r^  moment  of  the  Dirichlet 
distribution  P(a^ , . . . .Bj^)  is,  for  t ^ k - 1 and  r^^  a non-negative  integer 
such  that  r^^  positive  implies  positive,  for  i = l,...,t: 

r(oj  ♦ r2)...r(Oj^  ♦ Tj^)r(a) 

*^ri,...,r^  " r(aj)..  .r(aj^)r(a  ♦ r)  ’ 


(2.3) 
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H X. 

where  a = ^ o . and  r = ^ r . , 

i»l  ^ j=l  ^ 


For  k a positive  integer,  let  ‘ denote  the  ascending  factorial 
y(y  +1)  •••  (y  + k - 1)  and  define  = 1.  Then  it  is  convenient  to 

rewrite  (2.3)  as 

IT^]  [r^]  [r] 

“ll ■ 

For  a more  complete  treatment  of  the  Dirichlet  distribution,  the  reader 
is  referred  to  Wilks  [6]. 

Let  (X.A)  denote  a measurable  space.  A particular  stochastic  process 
(P(A):  } is  defined. 

Dgf  2-3.  (Ferguson,  [4]).  Let  a denote  a non-negative,  non-null, 

finite  measure  on  (X,A).  P is  a Dirichlet  process  on  (X,A)  with  parameter 
a if,  for  every  k = 1,  2,  ...,  and  every  measuraWe  partition  (B^,...,Bj^) 
of  X,  the  distribution  of  (P(Bj) , . . . ,P(Bj^))  is  Dirichlet  with  parameter 
(o(Bj),...,a(Bj^)). 

Ferguson  [4]  shows,  using  the  Kolmogorov  extension  theorem,  that  there 

A A 

exists  a probability  measure,  call  it  Q^,  on  ([0,1]  , 8F  ) yielding  the  above 

A 

finite-dimensional  marginal  Dirichlet  distributions.  Here  [0,1] 
represents  the  space  of  all  functions  from  A into  [0,1]  (which  thus  includes 
P,  the  set  of  all  probability  measures  on  (X,A))  and  8F^  is  the  a-field 
generated  by  the  field  of  cylinder  sets. 

Definition  2.4.  (Ferguson,  [4]).  The  collection  of  random  variables 
Xj,...,X^  is  said  to  be  a sample  o£  size  n from  the  Dirichlet  process  P on 
(^♦,A)  with  parameter  a if,  for  any  m * 1,  2,  ...,  and  measurable  sets 


(2.4) 


L • 
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Pt{Xj  € e C^|P(Aj),...,P(A^),P(Cj),....P(C  )}  = n P(C  ). 

3 = 1 J 


(2.5) 


where  Pr  denotes  probability. 

Intuitively,  Xj ,X^  is  a sample  of  size  n from  a Dirichlet  process  if  P is 
randomly  selected  according  to  and  then,  given  P,  Xj,.,.,X^  is  a sample 
from  the  probability  measure  P. 

Using  Kolmogorov's  extension  theorem  once  again,  Ferguson  shows  that 

there  exists  a probability  measure  on  (X"x[0,l]'^,  A*^x8F^)  with  marginal 

A A 

probability  on  ([0,1]  , 8F  ) given  by  the  above  Q^.  Since  this  probability 
also  depends  on  a,  it  will  also  be  called  Q^.  It  can  be  shown  (cf.  Berk 
and  Savage  [1])  that  concentrates  all  its  mass  on  (X^'xP,  A%cf(P)),  where 
a(P)  is  the  inherited  o-field  for  P from  8F  . Thus,  P is  a random 
probability  measure.  If  F(x)  = P(-“,x],  then  F is  a random  distribution 
function,  a sample  path  of  the  Dirichlet  process. 


Theorem  2.5.  (Ferguson,  [4]).  If  P is  a Dirichlet  process  on  (X,A)  with 
parameter  a,  and  if  X^,...,X^  is  a sample  of  size  n from  P,  then  the  condi- 
tional distribution  of  P given  Xj^,...,X  is  also  a Dirichlet  process  on 

n 

(X,A)  with  parameter  a + I S , where  6^  denotes  the  measure  with  mass  one 

i=l  ^i 

at  z,  zero  elsewhere. 

3.  Mixed  Rules  and  G-Minimax  Rules 
Doksum  [3]  considered  the  problem  of  partial  prior  information  in  the 
decision  theoretic  framework,  in  particular,  as  applied  to  nonparametric 
problems  with  Dirichlet  parameters  incompletely  specified.  It  is  assumed 
throughout  this  section  that  o(tj^,  i = l,...,k  - I,  are  known  and  that 

a(R-(tj,  tj^])  » 0. 


r._:l 
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Let  n be  a class  of  distribution  functions  of  (R,8),  where  R is  the 
real  line  and  8 the  Borel  o-field.  Suppose  that  Q,  the  probability  on  51, 
is  not  completely  specified  but  that,  for  fixed  real  numbers  t^,...,tj^,  the 
distribution  of  (F{t^) , . . . ,F(tj^))  is  known,  where  F is  a random  distribution 
function  from  Q.  Let  L(F,a)  denote  the  loss  function  for  action  a for  dis- 
tribution function  F e 51  and  d a decision  rule  from  the  observation  space  R 
to  the  action  space  A.  Then  the  risk  function  R(F,d),  associated  with 
distribution  function  F e 51  when  decision  rule  d is  taken,  is  defined  by 

R(F,d)  = EL(F,d(X)), 

where  the  expectation  is  over  X,  where  X has  distribution  F.  The  maximum 
risk,  R(d),  is  given  by 


R(d)  = sup  R(F,d). 

Fe51 

A rule  (if  one  exists)  which  minimizes  the  maximum  risk  over  all  decision 
rules  is  called  a minimax  rule.  The  average  risk,  R(Q,d),  for  completely 
specified  probability  Q on  fl,  is  given  by 

R(Q,d)  = /^R(F,d)dQ(F). 

A rule  (if  one  exists)  is  called  a Bayes  rule  if  it  minimizes  the  average 
risk  over  all  decision  rules. 

Definition  3.1.  (Doksum  [3]).  Let  51(q,k)  - {F  e SI:  F(t^)  = q^}  for 

k k 

q = (qj,...,qj^)  € R . Let  the  measure  X on  R , dependent  on  Q,  be  given  by 
A(q;  Q,k)  =»  Q{F  e 51:  F(t^)  s q^,  i = 1 , . . . ,k) . 


. 4 
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A is  then  the  distribution  of  F(tj) under  the  probability  measure 

Q.  The  average  maximum  risk,  rj^(O.d),  associated  with  probability  Q and 
decision  rule  d,  is 

rkCQ*'^)  * / V [sup  R(F,d)]dA(q). 

R Fen(q.k) 

A rule  is  said  to  be  mixed  (or  mixed  Bayes-minimax)  if  it  minimizes  the 
average  maximum  risk  over  all  decision  rules. 

Definition  3,2.  Let  G denote  a set  of  probability  measures  on  f2.  Define 

the  G-maximum  risk  for  rule  d as  sup  R(Q,d).  A rule  (if  it  exists'  is  said 

QcG 

to  be  G-minimax  if  the  rule  minimizes  the  G-maximum  risk  over  all  decision 
rules. 

If  Qp  denotes  the  probability  on  fl  which  is  the  distribution  function 

F with  probability  one,  then  a G-minimax  rule  is  minimax  if  G contains  Q_ 

F 

for  all  F e Jl. 

A natural  question  is  what  are  the  relationships  between  these  various 
risks  and  their  associated  rules.  Doksum  [3]  provides  a partial  answer. 

Lemma  3.3.  (Doksum  [3]).  For  any  decision  rule  d and  prior  Q on  D,  the 
following  hold: 

(i)  R(d)  s r^(Q.d)  s R(Q,d)  (k  s 1): 

(ii)  if{ri:  t ,<...<t  } ,isa  sequence  of  partitions  such 

m 

that  each  partition  is  a refinement  of  the  previous  one,  then 

(Q.d)  ^ (Q,d)  for  m < t, 

m t 


1 


8 


I 


t 

t 

I 

t 

r, 

I 

I 

I 

i 

\ 

f 


i 
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Definition  3.4.  The  carrier  of  a given  distribution  is  the  smallest  compact 
set  whose  probability  under  the  given  distribution  is  one.  For  example, 
for  F e the  carrier  of  F,  denoted  C(F),  is  the  smallest  compact  set  on 
R whose  probability  under  distribution  F is  one. 


Definition  3.S.  The  support  of  fJ,  S(n),  is  given  by 


S(n)  = u C(F). 
Fen 


Proposition  3.6.  If  Q e G,  then,  for  every  d. 


R(d)  s sup  R(Q',d)  > R(Q,d). 

Q'eG 

Proof.  Clearly,  sup  R(Q',d)  > R(Q,d)  since  Q e G.  But  also,  for  Q,,  as 
Q'eG  ^ 

defined  previously,  if  G*=G  u {Q  : F e 51},  then  sup  R(Q',d)  = R(d)  > 

Q'eG* 

sup  R(Q',d).  I I 
G'eG 


Doksum  defines  a rule,  which,  in  some  cases,  is  a mixed  rule.  Let 

tj  = inf{t:  t e S(51}}  and  let  tj^  = sup{t:  t e S(fi)}  and  assume 

-00  < t,  < t,  < ®.  Let  F , denote  the  polygonal  distribution  function 

with  F(t.)  = q.  for  i = l,...,k  and  F , linear  on  [t. , t.  ,1  for 

i = 1 k - 1.  Let  Fj^  denote  the  random  distribution  function  obtained 

by  letting  q in  F , have  distribution  X = X(-;  Q,k),  for  Q a prior  on  51. 
q,K 

Assume  Fj^  is  measurable.  Let  Qj^  denote  the  distribution  of  Fj^  and  dj^ 
the  Bayes  rule  for  Qj^  (if  it  exists) . 


Theorem  3.7.  (Doksum  [3]).  If  F_  e 51  for  almost  all  q in  C(X),  if  such 

— — — — R I X 

a dj^  exists,  and  if  rj^(Q,dj^)  = R(Qj^,dj^),  then  dj^  is  a mixed  procedure. 


i 

- 4 . J J 
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Theorem  3.7  provides  a method  for  obtaining  a mixed  rule;  i.e.,  one  finds 
the  Bayes  rule  for  prior  Q , and,  if  the  hypotheses  are  satisfied,  the 

K 

Bayes  rule  is  a mixed  rule. 

Let  = {Q  a probability  on  n:  (F(tj),  F(t2)  - F(t  ^ ) , . , . ,F  (t^)  - F(tj^_^)) 

has  a fixed,  known  distribution}. 

Proposition  3.8.  For  any  decision  rule  d and  for  Q e 

r.  (Q,d)  > sup  R(Q' ,d) . 

Proof.  For  Q',  Q"  e Gj^,  X(q;  Q',k)  = X(q;  Q",k)  for  all  q t r’', 
in  that  X depends  on  F at  t^,...,tj^,  for  F a random  distribution  function. 
Therefore,  rj^(Q',d)  = rj^(Q'',d)  for  all  rules  d.  Taking  sups  over  G'  e Gj^ 
on  both  sides  of  the  inequality 

r^(Q'.d)  ^ R(Q'.d), 

obtained  by  Proposition  3.6,  yields,  for  any  Q'*  e Gj^, 

r.  (Q",d)  2 sup  R(Q',d). 

Q'eG 

In  particular,  Q e Gj^  and  the  proof  is  complete.  | | 

Corollary  3.9.  If  Proposition  3.8  holds  and  if,  for  distribution  Q on  n, 

the  Bayes  risk  equals  the  mixed  risk  associated  with  mixed  rule  d,  then  d 

is  also  a G,  -minimax  rule, 
k 

Proof.  For  a Bayes  rule  6, 

R(d)  ^ r.  (Q,d)  2 sup  R(Q’,d)  5 "’(0,d)  > R{Q,«), 


I 


3 


d 

i 


W 


by  Lenuna  3.3  and  Propositions  3,6  and  3.8.  Now  note  by  assumption,  tj^CQ.d) 

= R(Q,6),  so  sup  R(Q',d)  = R(Q,6).  Therefore,  d is  G. -minimax.  || 

The  significance  of  Corollary  3.9  is  that,  in  certain  special  instances, 
a Gj^-rainimax  rule  can  be  found  by  finding  a Bayes  rule. 

Let  {III.:  t,^  j < . . . < tj,  },  _j  be  a sequence  of  partitions  such  that 

each  partition  is  a refinement  of  the  preceding  one  and  such  that 

|t,_  . . - t,  .|  0 as  !'•  -►  “.  Further,  suppose  the  t's  are  from  the  space 

[0,1].  Let  C[0,1]  denote  the  continuous  distribution  functions  defined  on 
[0,1].  For  partition  IIj^,  let  dj^  denote  a mixed  rule  for  the  given 
probability  Q on  C[0,1], 

Theorem  3.10.  (Doksum  [3]).  Let  F , denote  the  polygonal  distribution 

q , K 

function  with  F(t.  .)  = q.  and  F , linear  on  [t,  ■ . t.  . ,]  for 
K,1  1 q,K  K,1  K,X^X 

i * l,...,2^-l.  If  F^  € 0 for  almost  all  q in  C(X),  if  dj^  denotes  the 
mixed  rule  for  probability  Q on  n associated  with  partition  k,  and  if  d 
is  a Bayes  rule  such  that  d has  continuous  bounded  risk  R(Q,d),  then,  for 
fl  c C[0,1], 


lim  rj^(Q,dj^)  = lira  R(Q,d.)  = R(Q,d). 
k-w  k-H» 

Theorem  3.11.  Under  the  conditions  of  Theorem  3.10,  if  G, -minimax  rules 
k 

exist  for  k = 1,  2,...,  then,  for  Q e G^^  for  k * 1,  2,..., 

lim  sup  R(Q',6.)  * R(Q,d). 
k-H»  Q'eGj^ 

Proof.  It  follows  from  Propositions  3.6  and  3.8  and  the  definition  of  a 


G|^-minimax  rule  that 
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R(Q,d)  s R(Q,6  ) S sup  R(Q',6.)  s sup  R(Q',d.)  S r.  (Q.d,  ). 

Q'cG^  ^ Q'eG^  ^ I'  k 

Thus,  by  Theorem  3.10, 

lim  sup  R(Q',5,)  = R(Q,d).  | [ 
k-xB  Q'eGj^ 

The  importance  of  Theorem  3.11  is  that,  if  Gj^-minimax  rules  exist  and 

the  conditions  of  the  theorem  are  satisfied,  the  associated  G, -minimax 

k 

risk  approaches  the  Bayes  risk. 

The  application  of  this  development  to  the  Dirichlet  situation  will 
become  apparent  immediately.  Let  Gj^  = {Q  a probability  measure  on 

(^(^2^  ■ FCtj) » • • • *P(^k^  ~ 1^^  ^ Dirichlet  distribution  with 

parameters  (a(t ^ ,t2] , . • . ,a(tj^_ ^ ,tj^] } . Then  G^^-minimax  rules  are  exactly 
those  rules  for  which  a is  known  only  on  (k  - 1)  intervals.  The  search 
for  Gj^-minimax  rules  will  be  conducted  by  means  of  Corollary  3.9.  The 
behavior  of  such  rules  as  k -*■  <»,  under  the  conditions  enumerated,  is  given 
in  Theorem  3.11. 

The  remaining  two  sections  contain  applications  of  this  development. 
Section  4 treats  estimation  of  Pr(X  s Y)  under  incomplete  Dirichlet  prior 
information.  Section  5 considers  estimation  of  a rank  order  under  incomplete 
Dirichlet  prior  information. 

4.  Estimation  of  Pr{X  s Y}  Under  Partial  Prior  Information 
Consider  the  problem  of  estimating  Pr{X  s V>  in  the  two  sample  situation 
under  incomplete  Dirichlet  prior  information.  In  particular,  assume 
Xj,...,X^  is  a sample  of  size  m from  a Dirichlet  process  on  (R,B)  with 
parameter  a and  Yj,...,Y^  a sample  of  size  n from  a second  Dirichlet  process 
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(independent  of  the  first  process)  on  (R,B)  with  parameter  6.  Further, 
assume  that  t,,...,t,  are  fixed  such  that  a(t,,t.  ,1  and  6(t.,t.  ,]  are 
known  for  i = - 1 and  that  a(R-(t^,tj^])  = P(R-(tj,tj^])  * 0.  The 

parameter  of  interest  is  A{F,G)  = Pr{X  5 Y}  = /FdG  where  F is  the  random 
distribution  function  from  the  first  Dirichlet  process  and  G the  random 
distribution  function  from  the  second  process.  Let  F^  and  denote  the 
polygonal  random  distribution  functions  with 

for  i = l,...,k  and  F,  and  G,  linear  on  [t.,t.  ,1  for  i = l,...,k  - 1.  Then 

‘'fk-V  - Vs  ■ ‘ - G(t.)l. 

For  squared  error  loss  function,  the  Bayes  estimate  A,  of  A(r.  ,G,)  is 

K K u 

K ‘ ’‘m-S V' 

where  tt  denotes  that  F(t)  is  a Dirichlet  process  with  updated  parameter 
m n 

a + 1 and  G(t)  is  a Dirichlet  process  with  updated  parameter  & + ^ Ay  . 

i=l  i j=l 

Ltt  p^  = F(t^^j)  - F(t^)  and  p?  = - G(t.)  for  i = l,...,k  - 1.  By 

Theorem  2.5  and  Definition  2.4,  p = (Pj,...,py  j)  has  a Dirichlet  distribution 

k-1 

with  parameters  {a(t^,t.^j]  ♦ P'  * (Pj»*’*fP^  j)  a Dirichlet 

k-1 

distribution  with  parameters  {6(t.,t.  ,]  ♦ N.}.  ,,  where  M.  and  N.  denote 

1 1 1*1  1 1 

the  number  of  X's  and  Y's,  respectively,  which  fall  into 
i « l,...,k  - 1.  It  is  easy  to  see  by  independence  of  the  processes, 
therefore,  that  Aj^  is  given  by  the  right  hand-side  of  (1.1). 

The  estimator  Aj^  may  be  rewritten  as 


r 


1 


L 
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k-1 


o(R) 


6(R) 


k-1  + iM.  , e(t.,t.  ,] 

* (I  - a)B  I i ^ 

^ " i=l  m B(R) 


(4.1) 


k-1  a(t.,t.]  + ia(t.,t.  ,]  N. 

a.O  - 6„)  I -L-i ■ J- 

a(R)  n 


m n , 

1=1 


k-1  M, +...+M,  + iM.  , N. 

(1-0(1  -K)  I ' ^ ^ 


i=l 


m 


where  = o(R)/(a(R)  + m)  and  6^  = 6{R)/(B(f!)  + n) . Mote  that  this  estimator 


with  the  squared  error  loss  function  is  both  a mixed  rule  (by  Theorem 
3.7)  and  a Gj^-minimax  rule  (by  Corollary  3.9)  for  fi  = {(F,G):  p and  p* 

are  independent  Dirichlet  distributions  with  parameters  (a(t ,t2l , . . • , 
“(tk-i»^k^^  and  (e(tj,t2] , . . . ,B(tj^_j,tj^])  , respectively). 

As  the  tj^'s  become  dense,  is  seen  to  approach  Ferguson's  [4] 
estimator  for  Pr(X  ^ Y)  for  complete  Dirichlet  prior  information.  As 
a(R)  and  6(R)  -►  0,  approaches  the  Mann-Whitney  U'  statistic  for  grouped 
data  (as  given  in  Putter  [5]): 


k-1  M,+...+M.*  JM.  , N. 

I-  X I J i 111  ■ ^ 

i=l  m n 


As  o(R)  and  6(R)  get  large, 

k-1  o(tj,t.]  ♦ >a(tj.t^^j]  6(t^,t.^j] 


^k  I 

i-1 


«(R) 


6(R) 
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The  estimator  would  be  useful,  for  example,  in  the  following  situation. 
Suppose  there  are  two  middle-sized  towns  for  which  one  wishes  to  compare  the 
cholesterol  rates,  in  particular  to  estimate  Pr(X  s Y)  where  X is  the 
cholesterol  level  of  a randomly  selected  person  from  town  A and  Y is  the 
cholesterol  level  of  a randomly  selected  person  in  town  B.  Town  B could 
be  undergoing  a program  designed  to  lower  cholesterol  rates  with  town  A 
serving  as  a control.  There  is  prior  knowledge  about  the  cholesterol  levels 
in  the  two  towns.  The  prior  knowledge  is  quantified  by  specifying  the 
weights  a(tj^,t^^j]  and  6{tj^,t^^j^]  for  i = l,...,k  - 1.  The  values  o(R)  and 
B(R)  reflect  the  degrees  of  confidence  held  in  these  weights.  The  estimator 
A^  is  then  a combination  of  the  priors  and  the  actual  data  tabulated  by 
interval s . 

5.  Rank  Order  Estimation  Under  Partial  Prior  Information 
Let  X^,...,X^  be  a sample  of  size  n from  the  distribution  F.  Assuming 
F is  a random  distribution  function  chosen  according  to  the  Dirichlet  process 
prior  with  parameter  a,  Campbell  and  Hollander  [2]  derive  the  Bayes  estimator 
of  the  rank  order  G of  X^  among  Xj,...,X^  based  on  knowledge  of  r(<n) 
observed  values  Xj,...,X^.  In  this  Dirichlet  model,  care  must  be  taken  in 
the  definition  of  a rank  order  since  the  distribution  chosen  by  a Dirichlet 
process  is  discrete  with  probability  one,  c.f.  Berk  and  Savage  [1].  To  resolve 
the  issue  of  ties  with  regard  to  the  rank  order,  average  ranks  are  used. 

Definition  S.l.  Let  K,  L,  and  M denote  the  number  of  observations  of 

Xj^,  X2,...,X^  that  are  less  than,  equal  to,  and  greater  than  X^,  respectively. 

Then  the  rank  order  G of  X^  among  X^,  Xj,...,X^  is  the  average  value  of  the 
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ranks  that  would  be  assigned  to  the  L values  tied  at  , in  a joint  ranking 
from  least  to  greatest,  if  those  values  could  be  distinguished;  namely, 

G » {(K  ♦ 1)  ♦ (K  + 2)  (K  ♦ L)}/L  = [K  + { (L  ♦ 

Similarly,  for  K*,  L',  and  M'  defined,  respectively,  to  be  the 
number  of  observations  of  X^,  X2,...,X^  less  than,  equal  to,  and  greater 
than  Xj,  the  rank  order  G'  of  X^  among  X^,  X2,...,X^  is  given  by  G'  = K'  + 

(L>  + l)/2. 

For  squared  error  loss,  the  Bayes  estimator  is  (see  equation  (1.2) 
of  [2]) 

G = G'  + (n  - r){a'(-«,  X^)  + ia' ({Xj})}/a* (R)  , (5.1) 

r 

where  R is  the  real  line  and  o'  = o ♦ I <5^^  , where  6 is  that  measure 

i=l  ^i  ^ 

which  concentrates  its  entire  mass  of  one  at  the  point  z. 

In  this  section  it  is  assumed  that  a is  not  completely  knoivn;  instead  a 

is  specified  only  on  k-1  intervals  ^or  i = l,...,k  - 1,  with 

k ” 1 

a(R)  = I o(t.,  t ].  Let  F denote  the  polygonal  random  distribution  function  with 
i=l  1 1 A K 

Fj^(tj)  = F(t^),  i = l,...,k,  for  F a random  distribution  function  from  the 
Dirichlet  process.  What  is  the  Bayes  estimate  for  the  true  rank  order  g if 
F is  known  and  X^,...,X^  have  been  observed?  It  is  easy  to  appeal  to  equation 

(3.3)  of  [2]  for  Pr((K,  L,  M)  * (k,  t.  m)|X^ X^,F}.  The  mean  Gp  of  G , 

given  Xj,...,X^  and  F,is  obtained  from  the  mean  of  a multinomial.  We  find 


Gp  = G'  ♦ (n-r)(F(X‘)  ^ - '^(X')]}. 
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Restricting  just  to  polygonal  distribution  functions,  it  is  clear  that  G„  using 

r 

the  squared  error  loss  function  depends  on  F not  just  at  F.  (t.)»  i = 1, 

k K 1 

...,k.  This  makes  finding  a mixed  rule  for  the  rank  order  problem  most 
difficult. 

Suppose  the  observations  have  simply  been  grouped  into  intervals  where 
the  values  a assigns  to  these  intervals  are  known.  Rather  than  take  the 
loss  function  L(g,d)  = (g-d)^,  we  use  the  following  modified  loss  function. 

For  g(F,  Xj,...,X^)  = g'  ♦ (n-r)Flt.)  + i(n-r)  [F (t - F(t.)],  if 

Xj  € loss  is  given  by  [g(F,X^ , . . . ,X^) -d]  . The  mixed  Bayes 

minimax  rule  is  then  easily  shown  to  be 

6 = G'  + (n  - r)[a'(tj,t.]  + ia’ (t . ,t ]/«* (R)  (5.2) 

if  Xj  € ^^i*^i+l^  ^ ~ l**-***^  ■ i*  Noi®  this  rule  is  really  just 

the  Dirichlet  estimator  with  complete  information  concerning  the  parameter 

a,  but  where  a is  concentrated  at  (k  - 1)  atoms  {t.}.  _ so  that 
k ^ 

Z = a(R). 

i-2  ^ 

An  example  in  which  such  an  estimator  could  be  of  use  is  as  follows. 

An  automobile  driver  is  passing  through  a town  in  need  of  regular  gas. 

The  driver  knows  there  are  n stations  in  town  and  all  n clearly  post  their 
prices  for  gas.  From  past  experience  at  the  gas  pump,  the  driver  has  some 
idea  of  the  distribution  of  prices  in  the  region.  The  model  tends  to  be 
contagious  in  that  if  one  station  advertises  a particular  price,  competition 
(or  lack  of  it)  will  cause  others  to  be  more  likely  to  adopt  that  price  also. 

Hence  the  Dirichlet  model  is  not  unreasonable  here.  The  problem  is  for  the 


driver  to  estimate,  as  he  passes  the  r—  station,  the  rank  of  that  station' 
gas  price  among  all  n stations,  on  the  basis  of  the  prices  at  the  first  r 
stations  and  his  prior  information.  Then,  the  estimator  6 could  be  used, 
with  the  parameter  a(R)  reflecting  the  weight  or  confidence  attached  to 
the  driver's  prior  knowledge  of  regional  gasoline  prices. 
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